Step by step guide: fine-tune an LLM on your texts (part 1)

I recently wrote about my holiday project to fine-tune a pre-trained Llama 2 model on my entire 240,000 text message history, with pleasingly good results. And now, I present a step-by-step guide so that you, too, can have a substantive conversation with yourself…

There are three prerequisites:

You need to have an iPhone with a respectable amount of SMS, iMessage or WhatsApp history — at least a few thousand messages. If you haven’t kept your texts, clear your calendar for the weekend ahead and prepare to be chatty. Apologies to Team Android: I’m not sure how you get your hands on your messages. But I’d love to work together on this if you’re interested.
A budget between $50 and $100, or your local equivalent.
Working knowledge of Python, and at least foundational knowledge of the Data Science concepts involved: pre-trained transformers, tokens, text generation and all that. There’s a reading list at the end of this post.

There is a fourth requirement: patience and enthusiasm! It’s about the journey, not the destination etc etc. (But the destination was pretty great in my case).

I’ve tried out this guide in the US and UK, but not in other regions. Please contact me if you encounter snafus so we can work on it together.

Preliminaries

If you don’t already have these, you should set up:

A free Hugging Face account, to access the base Llama 2 model, and privately store your dataset and fine-tuning progress.
A free Weights and Biases account, to visualize your training progress.
A Google Colab plan. I started with the Pay as you Go option, but upgraded to the Pro+ option during my adventure. You can use any similar alternative to Google Colab, or even run locally on your box if you have a GPU handy.

Time to teach a Llama new tricks

Llama (“Large Language Model Meta AI”) surely needs no introduction around these parts. It’s the Open Source auto-regressive language model from Meta, tuned using supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF). Llama 2 was released in July 2023 after 7 months of training, and comes in sizes ranging from 7B to 70B parameters. We’ll start by using the 7B parameter version, then upgrade to 13B when results look decent.

If you haven’t already, you need to request access to the Llama model:

Visit Meta’s request form, select the Llama 2 & Llama Chat model, and accept Meta’s terms and conditions. The email you complete on this form needs to match your Hugging Face account email.
In theory you now need to wait 1-2 days for the email confirmation from Meta. In practice, I received mine within hours, and I hear others had the same experience.
Then, visit the model page on Hugging Face, and request access.

How does it look before we fine-tune?

I’ve made this basic Notebook so you can test out the setup using Google Colab, preferably on a V100 instance. Start with pip installs:

# Install libraries
!pip install transformers accelerate sentencepiece bitsandbytes

Next, go to the Hugging Face website, and from the Profile menu, select Settings, then Access Tokens. Set up a new token. Return to the Notebook, run this code, then paste in your token. Your Notebook is now signed in to Hugging Face.

from huggingface_hub import notebook_login
notebook_login()

It’s time to load the 7B Llama model. We’ll be using the Chat variant.

# Load the model

import torch
import transformers
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, pipeline

base_model_name = "meta-llama/Llama-2-7b-chat-hf"

tokenizer = AutoTokenizer.from_pretrained(base_model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"

quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
)

base_model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    quantization_config=quant_config,
    device_map="auto",
)

Finally, we use Hugging Face’s Text Generation pipeline to imagine a conversation. I’ll explain the format of our query, including the use of <<SYS>> and [INST] tags, in the next post in this series. If you’ve worked with Llama 2 before, you may notice that I’ve not stuck to the recommended specific query formatting: I found this structure performed slightly better.

# Ask the model to continue a basic Text Message exchange

query = "<s><<SYS>>Write a realistic text message chat. Avoid repetition.<</SYS>>\n"
query += "[INST]Write a chat between John and Jane[/INST]\n"
query += "### John: Hi, Jane. How are you?\n"
query += "### Jane: "
text_gen = transformers.pipeline(task="text-generation", model=base_model, tokenizer=tokenizer, max_length=300)
output = text_gen(query)
print(output[0]['generated_text'])

Here’s the result I get. Right out of the box, the 7B Llama 2 comes up with something perfectly believable.

<s><<SYS>>Write a realistic text message chat. Avoid repetition.<</SYS>>
[INST]Write a chat between John and Jane[/INST]
### John: Hi, Jane. How are you?
### Jane: Hey, John! I'm doing well, thanks for asking! Just got back from a great hike with some friends. How about you?
### John: Nice! I've been pretty busy with work lately. Had a big project due last night and it was a bit of a sprint to the finish line. But I'm glad it's over. How was your hike?
### Jane: It was amazing! We saw a bunch of deer and even a bear! I was a bit scared at first, but my friends were super supportive and we all made it back safely. How's your work going? Any exciting projects on the horizon?
### John: Yeah, work's been pretty hectic lately. But I'm hoping to take on some more responsibilities soon. I'm actually thinking of starting my own business on the side. It's a bit of a risk, but I think it could be really rewarding. How do you think?
### Jane: That sounds really exciting! I'm sure you'll do great. Just make sure to take some time for

Prepare to spend half your budget

Our next stop is a super handy utility called iMazing, available for Mac and PC, which will give you access to all your texts for $40, and unlimited local backups of your phone as an added bonus.

When you’ve installed iMazing:

Connect your phone to your Mac or PC and choose to take a backup
Select messages, click on the first Sender at the top of the list, scroll down to the bottom, shift click on the final Sender, so that all Senders are selected
Click ‘Export to CSV’, leave ‘Include header row’ checked, leave ‘Merge all chat sessions into one file’ selected, and click Next
Choose a filename like ‘all_text_messages.csv’ and save
Repeat for WhatsApp. I haven’t tried other chat apps but I assume most will work the same way.

Onwards! Nearly!

Everything is set up. It’s ALMOST time for action. Before we take the plunge, I wanted to suggest you take a moment to brush up on any areas where you don’t feel completely comfortable.

For a brilliant background to Deep Learning, Transformers and all things LLM, I can’t recommend highly enough the hugely popular videos from my great friend and colleague Jon Krohn; Jon is masterful at explaining deeply technical things in ways that just. make. sense.
The Hugging Face docs have APIs and Tutorials covering Transformers, Tokenizers, Text Generation and more.
We will be using QLoRA for fine-tuning. This is implemented in Hugging Face using PEFT methods (Parameter-Efficient Fine Tuning). Read all about it in the Hugging Face docs. You can also read the original LoRA paper (Low-Rank Adaptation of Large Language Models), and the QLoRA paper from last May.

Part 2 is here. We’ll read the CSV, then organize and investigate our text history. In subsequent parts, we’ll curate the dataset, fine-tune the model, experiment with hyper-parameters, and finally, run generation. Then you can kick back, relax, and let your AI take over.

Enter your email to be updated when the next part is ready.